Hyperplane Clustering via Dual Principal Component Pursuit
نویسندگان
چکیده
State-of-the-art methods for clustering data drawn from a union of subspaces are based on sparse and low-rank representation theory. Existing results guaranteeing the correctness of such methods require the dimension of the subspaces to be small relative to the dimension of the ambient space. When this assumption is violated, as is, for example, in the case of hyperplanes, existing methods are either computationally too intense (e.g., algebraic methods) or lack theoretical support (e.g., K-hyperplanes or RANSAC). The main theoretical contribution of this paper is to extend the theoretical analysis of a recently proposed single subspace learning algorithm, called Dual Principal Component Pursuit (DPCP), to the case where the data are drawn from of a union of hyperplanes. To gain insight into the expected properties of the non-convex `1 problem associated with DPCP (discrete problem), we develop a geometric analysis of a closely related continuous optimization problem. Then transferring this analysis to the discrete problem, our results state that as long as the hyperplanes are sufficiently separated, the dominant hyperplane is sufficiently dominant and the points are uniformly distributed (in a deterministic sense) inside their associated hyperplanes, then the non-convex DPCP problem has a unique (up to sign) global solution, equal to the normal vector of the dominant hyperplane. This suggests a sequential hyperplane learning algorithm, which first learns the dominant hyperplane by applying DPCP to the data. In order to avoid hard thresholding of the points which is sensitive to the choice of the thresholding parameter, all points are weighted according to their distance to that hyperplane, and a second hyperplane is computed by applying DPCP to the weighted data, and so on. Experiments on corrupted synthetic data show that this DPCP-based sequential algorithm dramatically improves over similar sequential algorithms, which learn the dominant hyperplane via state-of-the-art single subspace learning methods (e.g., with RANSAC or REAPER). Finally, 3D plane clustering experiments on real 3D point clouds show that a K-Hyperplanes DPCP-based scheme, which computes the normal vector of each cluster via DPCP, instead of the classic SVD, is very competitive to state-of-the-art approaches (e.g., RANSAC or SVD-based K-Hyperplanes).
منابع مشابه
DUAL PRINCIPAL COMPONENT PURSUIT Dual Principal Component Pursuit
We consider the problem of outlier rejection in single subspace learning. Classical approaches work with a direct representation of the subspace, and are thus efficient when the subspace dimension is small. Our approach works with a dual representation of the subspace and hence aims to find its orthogonal complement; as such it is particularly suitable for high-dimensional subspaces. We pose th...
متن کاملImproved Algorithm for Fully-automated Neural Spike Sorting based on Projection Pursuit and Gaussian Mixture Model
For the analysis of multiunit extracellular neural signals as multiple spike trains, neural spike sorting is essential. Existing algorithms for the spike sorting have been unsatisfactory when the signal-to-noise ratio (SNR) is low, especially for implementation of fully-automated systems. We present a novel method that shows satisfactory performance even under low SNR, and compare its performan...
متن کاملA variational approach to stable principal component pursuit
We introduce a new convex formulation for stable principal component pursuit (SPCP) to decompose noisy signals into low-rank and sparse representations. For numerical solutions of our SPCP formulation, we first develop a convex variational framework and then accelerate it with quasi-Newton methods. We show, via synthetic and real data experiments, that our approach offers advantages over the cl...
متن کاملProjection Pursuit via Decomposition of Bias Termsof Kernel Density
Dimension reduction of data, < d ! < p (p << d), to be used for clustering has speciic requirements that are not generally met by generic dimension reduction algorithms such as principal components. Projection pursuit, on the other hand, has a growing variety of criteria that target holes, skewness, etc., using information measures, density functionals, sample moments, etc. With the exception o...
متن کاملDocument Retrieval and Clustering: from Principal Component Analysis to Self-aggregation Networks
We first extend Hopfield networks to clustering bipartite graphs (words-to-document association) and show that the solution is the principal component analysis. We then generalize this via the min-max clustering principle into a self-aggregation networks which are composed of scaled PCA components via Hebb rule. Clustering amounts to an updating process where connections between different clust...
متن کامل